26 research outputs found

    Efficient Heap Implementation with a Fixed-Size Linear Systolic Array

    Get PDF

    Design and Evaluation of Approaches for Automatic Chinese Text Categorization

    Get PDF
    [[abstract]]In this paper, we propose and evaluate approaches to categorizing Chinese texts, which consist of term extraction, term selection, term clustering and text classification. We propose a scalable approach which uses frequency counts to identify left and right boundaries of possibly significant terms. We used the combination of term selection and term clustering to reduce the dimension of the vector space to a practical level. While the huge number of possible Chinese terms makes most of the machine learning algorithms impractical, results obtained in an experiment on a CAN news collection show that the dimension could be dramatically reduced to 1200 while approximately the same level of classification accuracy was maintained using our approach. We also studied and compared the performance of three well known classifiers, the Rocchio linear classifier, naive Bayes probabilistic classifier and k-nearest neighbors(kNN) classifier, when they were applied to categorize Chinese texts. Overall, kNN achieved the best accuracy, about 78.3%, but required large amounts of computation time and memory when used to classify new texts. Rocchio was very time and memory efficient, and achieved a high level of accuracy, about 75.4%. In practical implementation, Rocchio may be a good choice

    External-Memory Computational Geometry

    Get PDF
    (c) 1993 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.In this paper we give new techniques for designing e cient algorithms for computational geometry prob- lems that are too large to be solved in internal mem- ory. We use these techniques to develop optimal and practical algorithms for a number of important large- scale problems. We discuss our algorithms primarily in the context of single processor/single disk machines, a domain in which they are not only the rst known optimal results but also of tremendous practical value. Our methods also produce the rst known optimal al- gorithms for a wide range of two-level and hierarchical multilevel memory models, including parallel models. The algorithms are optimal both in terms of I/O cost and internal computation

    Techniques for solving geometric problems on mesh-connected computers

    No full text
    The contributions of this thesis are twofold: (i) we solve optimally some problems on conventional Mesh-Connected Computers, which were not previously solved optimally, and (ii) we present new algorithms for several geometric problems on more realistic models. On conventional Mesh-Connected Computers, in which the n processors are arranged as a (multidimensional) array, we present a new technique for optimally performing n searches on a class of hierarchical DAGs, which leads to the first optimal mesh algorithms for the three dimensional convex hull and convex polyhedra intersection problems, settling an open problem which was posed in (AW88) and in (MS88b). The previous algorithms were a log n factor away from optimality. On the more realistic models (RAM/ARRAY(d)), in which the d-dimensional Mesh-Connected Computer is of fixed-size p and is attached to a random access machine, we present new algorithms for several geometric problems, which achieve the same speedup for a problem of arbitrary size n ≄\geq p as for a problem of size p. The problems include that of computing the all nearest neighbors of a planar set of points, the measure and perimeter of a union of rectangles, visibility of a set of nonintersecting line segments from a point, and dominance counting between two planar sets of points. All of the problems have sequential time complexity Θ\Theta(n log n) and have O(p\sp{1/d}) solutions for a problem of size p on a d-dimensional Mesh-Connected Computers of p-processors. Hence, the RAM/ARRAY(d) achieves the speedup of O(p\sp{1-1/d} log p) for a problem of size p. Thus our contribution is to show that the speedup of O(p\sp{1-1/d} log p) can be achieved for arbitrarily large problem size

    On the Parallel-Decomposability of Geometric Problems

    Get PDF

    Improving Linear Classifier for Chinese Text Categorization

    No full text
    [[abstract]]The goal of this paper is to derive extra representatives from each class to compensate for the potential weakness of linear classifiers that compute one representative for each class. To evaluate the effectiveness of our approach, we compared with linear classifier produced by Rocchio algorithm and the k-nearest neighbor (kNN) classifier. Experimental results show that our approach improved linear classifier and achieved micro-averaged accuracy close to that of kNN, with much less classification time. Furthermore, we could provide a suggestion to reorganize the structure of classes when identify new representatives for linear classifier

    Passive Forgery Detection for JPEG Compressed Image based on Block Size Estimation and Consistency Analysis

    No full text
    As most of digital cameras and image capture devices do not have modules for embedding watermark or signature, passive forgery detection which aims to detect the traces of tamping without embedded information has become the major focus of recent research for JPEG compressed image. However, our investigation shows that current approaches for detection and localization of tampered areas are very sensitive to image contents, and suffer from high false detection rates for localization of tampered areas for images with intensive edges and textures. In this paper, we present an effective approach which overcomes above problem, using reliable estimation and analysis of block sizes from the block artifacts resulting in JPEG compression process. We first propose an enhanced cross difference filter to strengthen block artifacts and reduce interference from edges and textures, and then integrate techniques from random sampling, voting and maximum likelihood method to improve the accuracy of block size estimation. We develop two different random sampling strategies for block size estimation: one for estimation of the primary JPEG block size, and the other for consistency analysis of local block sizes. Local blocks whose JPEG block sizes are different from the primary block size are classified as tampered blocks. We finally perform a refinement process to eliminate false detections and fill in undetected tampered blocks. Experiment over various tampering methods such as copy-and-paste, image completion and composite tampering, shows that our approach can effectively detect and localize tampered areas, and is not sensitive to image contents such as edges and textures
    corecore